1,055 research outputs found
Working Close to Home: WIRE-Net's Hire Locally Program
Hire Locally is an employment program that matches Cleveland's west side residents with industrial jobs employers would otherwise have searched far and wide to fill. The program is part of the nonprofit Westside Industrial Retention and Expansion Network, or WIRE-Net. This report documents the program's innovation in developing a sectoral strategy to meet labor market demands while also setting a broad agenda for community improvement. It also shares key program elements and recommendations to ensure that future programs are more effective
Optimal Estimation and Rank Detection for Sparse Spiked Covariance Matrices
This paper considers sparse spiked covariance matrix models in the
high-dimensional setting and studies the minimax estimation of the covariance
matrix and the principal subspace as well as the minimax rank detection. The
optimal rate of convergence for estimating the spiked covariance matrix under
the spectral norm is established, which requires significantly different
techniques from those for estimating other structured covariance matrices such
as bandable or sparse covariance matrices. We also establish the minimax rate
under the spectral norm for estimating the principal subspace, the primary
object of interest in principal component analysis. In addition, the optimal
rate for the rank detection boundary is obtained. This result also resolves the
gap in a recent paper by Berthet and Rigollet [1] where the special case of
rank one is considered
Optimal Rates of Convergence for Noisy Sparse Phase Retrieval via Thresholded Wirtinger Flow
This paper considers the noisy sparse phase retrieval problem: recovering a
sparse signal from noisy quadratic measurements , , with independent sub-exponential
noise . The goals are to understand the effect of the sparsity of
on the estimation precision and to construct a computationally feasible
estimator to achieve the optimal rates. Inspired by the Wirtinger Flow [12]
proposed for noiseless and non-sparse phase retrieval, a novel thresholded
gradient descent algorithm is proposed and it is shown to adaptively achieve
the minimax optimal rates of convergence over a wide range of sparsity levels
when the 's are independent standard Gaussian random vectors, provided
that the sample size is sufficiently large compared to the sparsity of .Comment: 28 pages, 4 figure
Sparse PCA: Optimal rates and adaptive estimation
Principal component analysis (PCA) is one of the most commonly used
statistical procedures with a wide range of applications. This paper considers
both minimax and adaptive estimation of the principal subspace in the high
dimensional setting. Under mild technical conditions, we first establish the
optimal rates of convergence for estimating the principal subspace which are
sharp with respect to all the parameters, thus providing a complete
characterization of the difficulty of the estimation problem in term of the
convergence rate. The lower bound is obtained by calculating the local metric
entropy and an application of Fano's lemma. The rate optimal estimator is
constructed using aggregation, which, however, might not be computationally
feasible. We then introduce an adaptive procedure for estimating the principal
subspace which is fully data driven and can be computed efficiently. It is
shown that the estimator attains the optimal rates of convergence
simultaneously over a large collection of the parameter spaces. A key idea in
our construction is a reduction scheme which reduces the sparse PCA problem to
a high-dimensional multivariate regression problem. This method is potentially
also useful for other related problems.Comment: Published in at http://dx.doi.org/10.1214/13-AOS1178 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Optimal Hypothesis Testing for High Dimensional Covariance Matrices
This paper considers testing a covariance matrix Σ in the high dimensional setting where the dimension p can be comparable or much larger than the sample size n. The problem of testing the hypothesis H0:Σ=Σ0 for a given covariance matrix Σ0 is studied from a minimax point of view. We first characterize the boundary that separates the testable region from the non-testable region by the Frobenius norm when the ratio between the dimension p over the sample size n is bounded. A test based on a U-statistic is introduced and is shown to be rate optimal over this asymptotic regime. Furthermore, it is shown that the power of this test uniformly dominates that of the corrected likelihood ratio test (CLRT) over the entire asymptotic regime under which the CLRT is applicable. The power of the U-statistic based test is also analyzed when p/n is unbounded
Theoretical Foundations of t-SNE for Visualizing High-Dimensional Clustered Data
This paper investigates the theoretical foundations of the t-distributed
stochastic neighbor embedding (t-SNE) algorithm, a popular nonlinear dimension
reduction and data visualization method. A novel theoretical framework for the
analysis of t-SNE based on the gradient descent approach is presented. For the
early exaggeration stage of t-SNE, we show its asymptotic equivalence to power
iterations based on the underlying graph Laplacian, characterize its limiting
behavior, and uncover its deep connection to Laplacian spectral clustering, and
fundamental principles including early stopping as implicit regularization. The
results explain the intrinsic mechanism and the empirical benefits of such a
computational strategy. For the embedding stage of t-SNE, we characterize the
kinematics of the low-dimensional map throughout the iterations, and identify
an amplification phase, featuring the intercluster repulsion and the expansive
behavior of the low-dimensional map, and a stabilization phase. The general
theory explains the fast convergence rate and the exceptional empirical
performance of t-SNE for visualizing clustered data, brings forth
interpretations of the t-SNE visualizations, and provides theoretical guidance
for applying t-SNE and selecting its tuning parameters in various applications.Comment: Accepted by Journal of Machine Learning Researc
- …